ECNU at SemEval-2016 Task 1: Leveraging Word Embedding From Macro and Micro Views to Boost Performance for Semantic Textual Similarity
نویسندگان
چکیده
This paper presents our submissions for semantic textual similarity task in SemEval 2016. Based on several traditional features (i.e., string-based, corpus-based, machine translation similarity and alignment metrics), we leverage word embedding from macro (i.e., first get representation of sentence, then measure the similarity of sentence pair) and micro views (i.e., measure the similarity of word pairs separately) to boost performance. Due to the various domains of training data and test data, we adopt three different strategies: 1) U-SEVEN: an unsupervised model, which utilizes seven straight-forward metrics; 2) S1-All: using all available datasets; 3) S2: selecting the most similar training sets for each test set. Results on test sets show that the unified supervised model (i.e., S1-All) achieves the best averaged performance with a mean correlation of 75.07%.
منابع مشابه
ECNU: Using Traditional Similarity Measurements and Word Embedding for Semantic Textual Similarity Estimation
This paper reports our submissions to semantic textual similarity task, i.e., task 2 in Semantic Evaluation 2015. We built our systems using various traditional features, such as string-based, corpus-based and syntactic similarity metrics, as well as novel similarity measures based on distributed word representations, which were trained using deep learning paradigms. Since the training and test...
متن کاملRICOH at SemEval-2016 Task 1: IR-based Semantic Textual Similarity Estimation
This paper describes our IR (Information Retrieval) based method for SemEval 2016 task 1, Semantic Textual Similarity (STS). The main feature of our approach is to extend a conventional IR-based scheme by incorporating word alignment information. This enables us to develop a more fine-grained similarity measurement. In the evaluation results, we have seen that the proposed method improves upon ...
متن کاملSERGIOJIMENEZ at SemEval-2016 Task 1: Effectively Combining Paraphrase Database, String Matching, WordNet, and Word Embedding for Semantic Textual Similarity
In this paper, a system for semantic textual similarity, which participated in Task1 in SemEval 2016 (monolingual and crosslingual sub-tasks) is described. The system contains a preprocessing step that simplifies text using PPDB 2.0 and detects negations. Also, six lexical similarity functions were constructed using string matching, word embedding and synonyms-antonyms relations in WordNet. The...
متن کاملQLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings
This paper reports the details of our submissions in the task 1 of SemEval 2017. This task aims at assessing the semantic textual similarity of two sentences or texts. We submit three unsupervised systems based on word embeddings. The differences between these runs are the various preprocessing on evaluation data. The best performance of these systems on the evaluation of Pearson correlation is...
متن کاملDalGTM at SemEval-2016 Task 1: Importance-Aware Compositional Approach to Short Text Similarity
This paper describes our system submission to the SemEval 2016 English Semantic Textual Similarity (STS) shared task. The proposed system is based on the compositional text similarity model, which aggregates pairwise word similarities for computing the semantic similarity between texts. In addition, our system combines word importance and word similarity to build an importance-similarity matrix...
متن کامل